feat(go): add go desrialization support via io streams#3374
feat(go): add go desrialization support via io streams#3374chaokunyang merged 40 commits intoapache:mainfrom
Conversation
|
Hey @chaokunyang |
|
hey @ayush00git, looked through this and the main issue i see is in func (f *Fory) DeserializeFromReader(r io.Reader, v any) error {
defer f.resetReadState()
f.readCtx.buffer.ResetWithReader(r, 0) // this wipes the prefetch window every timeso if fill() reads ahead past the first object boundary (which it will), those bytes for {
var msg Msg
f.DeserializeFromReader(conn, &msg) // bytes after first object get thrown away
}if you look at how he handles this for c++/python — the Buffer is constructed the go version probably needs something similar — a stream reader type that owns the Happy to discuss if I'm misreading the flow here |
|
Hiii @Zakir032002 |
|
hey @ayush00git , one more thing — ReadBinary and ReadBytes return a direct slice into the problem is fill() compacts the buffer in-place: so if someone reads a []byte field and holds onto that slice, then the next in stream mode you probably want to copy before returning instead of aliasing: in-memory path stays as is. |
|
also noticed — easiest fix is probably just routing the multi-byte case through Happy to discuss if I'm misreading the flow here |
|
Hey @Zakir032002 |
|
Hii @Zakir032002
But i think you misunderstood the if len(b.data)-readIdx >= 5 {
}If we are near a chunk boundary (less than 5 bytes remaining in the buffer), the execution completely skips |
…le stateful deserialization
…le stateful deserialization
|
I've added the |
|
@chaokunyang |
| copy(b.data, b.data[b.readerIndex:]) | ||
| b.writerIndex -= b.readerIndex | ||
| b.readerIndex = 0 | ||
| b.data = b.data[:b.writerIndex] |
There was a problem hiding this comment.
Why need to process on writerIndex?
There was a problem hiding this comment.
we're using a sliding window approach to push the unread bytes to the front, so if we only put readerIndex = 0 and keep the writerIndex at its old position, it would track garbage value left in between.
go/fory/buffer.go
Outdated
| if b.readerIndex+8 > len(b.data) { | ||
| *err = BufferOutOfBoundError(b.readerIndex, 8, len(b.data)) | ||
| return 0 | ||
| if !b.fill(8, nil) { |
There was a problem hiding this comment.
Why you pass nil, this will just slicence the error
There was a problem hiding this comment.
my bad. i'll fix it up
go/fory/buffer.go
Outdated
| //go:inline | ||
| func (b *ByteBuffer) ReadVaruint36Small(err *Error) uint64 { | ||
| if b.remaining() >= 8 { | ||
| if b.remaining() >= 8 || (b.reader != nil && b.fill(8, nil)) { |
There was a problem hiding this comment.
If it's a network stream ,and write don't close teh stream, then the stream EOF never come, this will just hang forever
There was a problem hiding this comment.
yes we should never hang it on a pre-fill 8 bytes. let me fix it.
|
Some Design suggestions:
|
|
And please run benchmarks/go for current branch and pache/main branch, and paste the result here |
|
Benchmark: apache/fory/main vs this branch (feat/go-deserialization) apache/fory/main: this branch: |
|
@chaokunyang |
go/fory/test_helper_test.go
Outdated
|
|
||
| // Create a new stream reader. The stream context handles boundaries and compactions. | ||
| streamReader := NewInputStream(stream) | ||
| err = f.DeserializeFromStream(streamReader, v) |
There was a problem hiding this comment.
This helper deserializes into the same target twice (first from bytes, then from stream). That can mask stream-path bugs because values populated by the first pass may still be present if the second pass does not fully overwrite fields.
Please deserialize the stream path into a fresh value and compare results, so partial/incorrect stream decoding is detectable.
go/fory/stream_test.go
Outdated
| reader := &slowReader{data: data} | ||
| var decoded StreamTestStruct | ||
| // Use small minCap (16) to force frequent fills and compactions | ||
| f.readCtx.buffer.ResetWithReader(reader, 16) |
There was a problem hiding this comment.
This setup line does not affect the code path under test, because DeserializeFromReader immediately calls ResetWithReader(r, 0) internally and replaces the min-cap you set here.
If the goal is to verify small-cap refill/compaction behavior, consider using DeserializeFromStream with NewInputStreamWithMinCap(..., 16) so the configured capacity is actually used.
go/fory/stream.go
Outdated
| } | ||
|
|
||
| // NewInputStreamWithMinCap creates a new InputStream with a specified minimum buffer capacity. | ||
| func NewInputStreamWithMinCap(r io.Reader, minCap int) *InputStream { |
There was a problem hiding this comment.
THis is buffer size, not capacity, pelase change API name
There was a problem hiding this comment.
i'll change it to NewInputStreamWithBufferSize. does that sounds good ? It would also resonate with the api implemented in the cpp strem deserialization
Why?
To enable stream-based deserialization in Fory's Go library, allowing for direct reading from
io.Readerwithout pre-buffering the entire payload. This improves efficiency for network and file-based transport and brings the Go implementation into feature-parity with the python and C++ libraries.What does this PR do?
1. Stream Infrastructure in
go/fory/buffer.goEnhanced
ByteBufferto supportio.Readerwith an internal sliding window and automatic filling.reader io.ReaderandminCap intfields.fill(n int, err *Error) boolfor on-demand data fetching and buffer compaction.CheckReadable(n)andSkip(n)memory-safe routines that pull from the underlying stream when necessary to avoid out-of-bounds panics.ReadBinaryandReadBytesto safely copy slices when streaming to prevent silent data corruption on compaction.Read*methods (fixed-size, varint, tagged) to fetch data from the reader safely if not cached.2. Stateful InputStream in
go/fory/stream.goAdded the
InputStreamfeature to support true, stateful sequential stream reads.InputStreamwhich persists the buffered byte window and TypeResolver metadata (Meta Sharing) across multiple object decodes on the same stream, decoupled fromForyto mirror the C++ForyInputStreamimplementation.fory.DeserializeFromStream(is, target)method to process continuous streamed data.Shrink()method to compact the internal buffer and reclaim memory during long-lived streams.DeserializeFromReadermethod as an API for simple one-off stream object reads.3. Stream-Safe Deserialization Paths
Updated internal deserialization pipelines in
struct.goandtype_def.goto be stream-safe:CheckReadablebounds-checking into thestruct.gofast paths for fixed-size primitives.skipTypeDef) intype_def.goto use bounds-checkedSkip()rather than unboundedreaderIndexoverrides.4. Comprehensive Stream Tests
oneByteReaderwrapper (go/fory/test_helper_test.go) that artificially feeds the deserialization engine exactly 1 byte at a time.struct_test.go,primitive_test.go,slice_primitive_test.go, etc.) to run all standard tests through this aggressive 1-byte fragmented stream reader via a newtestDeserializehelper to guarantee total stream robustness.Related issues
Closes #3302
Does this PR introduce any user-facing change?
NewInputStream,DeserializeFromStream,DeserializeFromReader,NewByteBufferFromReader)Benchmark
Main branch -

This branch -
